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Abstract 

There are several scoring rules that one can choose from in order to score prob- 
abilistic forecasting models or estimate model parameters. Whilst it is generally 
agreed that proper scoring rules are preferable, there is no clear criterion for pre- 
ferring one proper scoring rule above another. This manuscript compares and con- 
trasts some commonly used proper scoring rules and provides guidance on scoring 
rule selection. In particular, it is shown that the logarithmic scoring rule prefers 
erring with more uncertainty, the spherical scoring rule prefers erring with lower 
uncertainty, whereas the other scoring rules are indifferent to either option. 

Keywords: estimation; forecast evaluation; probabilistic forecasting; utility function 

1 Introduction 

Issuing probabilistic forecasts is meant to express uncertainty about the future evo- 
lution of some quantity of interest. Such forecasts arise in many applications such 
as macroeconomics, finance, weather and climate forecasting. There are several scor- 
ing rules that one can choose from in order to elicit probabilistic forecasts, rank com- 
peting forecasting models or estimate forecast distribution parameters. It is generally 
agreed that one should select scoring rules that encourage a forec aster to state his 'best ' 
judgement of the distribution, the so called proper scoring rules (IFriedmanl . Il983l : iNaul . 
19851 : iGneiting and Raftervl . 120071 ). but which one to use is generally an open question. 
We shall take scoring rules to be loss functions that a forecaster wishes to minimise. 
Scoring rules that are minimised if and only if the issue d forecasts coincide wi t h the 
forecaster's best ju d geme nt are said to be strictly proper (jGneiting and Rafterv . 2007 : 
Brocker and Smith! . 120071 ) . We shall restrict our attention to strictly proper scoring 
rules. 

Nonetheless using scoring rules to rank competing forecasting models poses a prob- 
lem; scoring rules do not provide a universally acceptable ranking of pe rformance. In es- 
timation, different scoring rules will yield different parameter estimates (jGneiting and Rafterv 
20071 : Ijohnstone and Linl . l201lh . Moreover, a forecaster's best judgement may depart 
from the id eal; the ideal is a dis tribution that nature or the data generating process 
would give (jGneiting et all l2007t ) . Although strictly proper scoring rules encourage ex- 
perts to issue their best judgements, such judgements may yet differ from each other 
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and th e ideal. Which scoring rule should one use to choose between two experts? I Savage 
made the instructive statement that "any criteria for distinguishing among scor- 
ing rules must arise out of departures of actual subjects from the ideal." There have 
be en some effort s to contrast scoring rules, but none seem to have followed this insight. 
Bickell (120071 ) made empirical comparisons of the quadratic, spherical and logarithmic 



scoring rules and found them to yield different rankings of competing forecasts but failed 
to see why. Considering a concave nonlinear utility function that explicitly depends on 
the scoring rule, he also found the logarithmic scoring rule to yield the least departures 
from honest opinions at maximal utility, a point he claimed favours it as a rule of 
choice. But a utility f u nctio n need not be exponential nor explicitly depend on the 
scoring rule. Jose et al. (|2008l ) considered weighted scoring rules and showed that they 
correspond to different utility functions. A limiting feature of the utility functions 
considered is that they are defined on bounded intervals; there are many applications in 
which the variable of interest is unbounded. Their motivation for weighted scoring rules 
is based on b etting arguments, but it is not clear what the betting strategies (if any) 
are. Recently, iBoero et al\ (120111 ) empirically compared the Quadratic Probability Score 
(QPS), Ranked Probability Score (RPS) and the logarithmic scoring rule on UK inflation 
forecasts by the Monetary Policy Committee and the Survey of External Forecasters 
(SEF). They found the scoring rules to rank the two sets of distributions similarly. 
Upon ranking individual forecasters from the SEF, they found the RPS to have better 
discriminatory power than the QPS, a feature they attributed to the RPS's sensitivity 
to distance. Despite the foregoing efforts, there is lacking a theoretical assessment of 
what the preferences of the commonly used scoring rules are with respect to the ideal. 

This paper contrasts how different scoring rules would rank competing forecasts of 
specified departures from ideal forecasts and provides guidance on scoring rule selection. 
It focuses upon those scoring rules that are commonly used in the forecasting literature, 
including econometrics and meteorology. More specifically, we contrast the relative 
information content of forecasts preferred by different scoring rules. Implications of the 
results on decision making are then suggested, noting that it may be desirable to be 
more or less uncertain when communicating probabi l istic f orecasts. We realise that an 
appropriate utility function m ay be unknown (Bickell. 2007 ) and expected utility theory 
may not even be appropriate ( Kahneman and TverskvT 19791 ). 

In section [2l we consider the case of scorin g categorica l forecasts by the Brier 
score (IBrierl. Il950h. the logarithmic scoring rule Goodl . Il952l ) and the spherical scor- 
ing rule ( Friedman . 19831 ). For simplicity, special attention is focused on binary fore- 
casts. This section then inspires our study of den sity forecasts in section EH w here we 
consider three scoring rule s: the Quadratic Score (IGneiting and Raftervl . 120071 ) . Loga- 
rithmic Score dCoodl. Il952h Sph erical Score (jFriedmanl . Il983l ) and Continuous Ranked 
Probability Score ~(jEpsteinl . [l969h . We conclude with a discussion of the results in sec- 
tion m 



2 Categorical Forecasts 



In this section, we co nsider the sc oring of categorical forecasts. The scoring rules consid- 
Brielll95(l l. ng rule and the spherical scoring 



ered are Brier score 



rule (|Friedmanl .ll983). In order to aid intuition in the next section, here we focus on the 
binary case. Another com monly u s ed sco ring rule for categorical forecasts is the Ranked 
Probability Score (RPS) (jEpsteinl . [l969h . In the binary case, the RPS score reduces to 
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the Brier score. 

It will be useful to be aware of the following basics. Given any vectors /,g € 5ft m , 
the inner product between the two vectors is 



(f,g) = ^2fi9i, 



i=l 



from which the L2-norm is defined by H/H2 = (/, /) 



1/2 



2.1 The Brier score 

Consider a probabilistic forecast of m categorical events. Suppose the true 

dist ribution is fa l^. If the actual outcome is the jth. category, the Brier score is given 
by (jBrierl . Il950h 



^ 111 

BS(f,j) = -J2(fi- S i 
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8=1 



where 5ij = if % 7^ j and 6{j = 1 if i = j . If follows that if we expand out the bracket 
we get 

The expected Brier score is then given by 

m 

nBS(f,J)] = Y,PjBS(f,j) 

1 m 

= -E(/'- 2 ^+^) 

- m 

= ~ Yl Vfi-Pif+Pi-Pi 

m ^-^ L J 

i=i 

= -|||7||I + £k(i- Pi) \, 

where 7 is a vector with components 7, = fi — Pi for all i = 1, . . . , m. It is evident from 
the last expression on the right hand side that the Brier score is effective with respect 
to the metric d,2(f,g) = \\f — g\\2- When m = 2, we can put /1 = p + 7, pi = p and 
P2 = q and obtain 

E[BS(f,J)] =j 2 +pq. 

It follows that ±7 will yield the same Brier score. This means the Brier score does not 
discriminate between over-estimating and under-estimating the probabilities with the 
same amount. Further more, for any two forecasts fi = (p + 7«, q — 7i)> 2 = 1, 2, with 
1 7i I < 1 72 1, the Brier score would prefer the forecast corresponding to 71. 

2.2 Logarithmic scoring rule 

The logarit hmic scoring rule was prop osed bv iGoodl (Il952h . It was later termed Ig- 
norance bv iRoulston and Smith! ( 20021 ) when they introduced it to the meteorological 
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community. Given a probabilistic forecast / = /2, • • • , f m ), the logarithmic scoring 
rule is given by LS(/, j) = — log fj, where j denotes the category that materialises. Let 
us consider the expected logarithmic score of the forecasting scheme / = (p + 7, q — 7): 

E[LS(/, J)] = -plog(p + 7) " qlog(q - 7), (1) 

where J £ {1, 2} is a random variable. The above expectation is also r eferred to as 
th e Kullback - Leibl er Information Criterion ( Corradi and Swanson . 20061 ). As noted 



bv lFriedmar] (| 19831 ). this scoring rule is not effective 



If we let / + = (p + 7, q — 7) and /_ = (p — 7, q + 7), then we can define E[LS]± = 
E[LS(/ + , J)] — E[LS(/_, J)]. Then, assuming that 7 > without loss of generality, 

E[LS]± =plog (^-) +q log (i±^ (2) 



P + 7/ 

Note that when p = q = 0.5, then E[LS]± = 0, otherwise E[LS]± ^ 0. Differentiating ([2]) 
with respect to 7 yields 

^ 1 ]± " (p 2 -7 2 )(5 2 -7 2 ) (3) 
Expressions ([2j) and © are well defined provided 7 < min(p, q). 

^E[LS]±>0, if p>q 

4^E[LS]±<0, \ip<q 
d7 

It follows that E[LS]± > if p > q and E[LS]± < if p < q. In other words, the loga- 
rithmic score penalises over confidence on the likely outcome and rewards erring on the 
side of caution. Given forecasting schemes that are equally calibrated, the logarithmic 
score will prefer the one with a higher entropy. To explain this further, let us denote 
the entropy of the forecast corresponding to 7 by ^(7), i.e. 

Hi) = -ijP + 7) log(p + 7) - (<? - 7) log(q - 7). (4) 

We now define the function £(7) = ^(7) — h(— r y) and claim that G{^f) < for < 7 < 
q < p. To prove this claim, we first note that G(0) = 0. It then suffices to show that 
G'(0) < 0. Note that 

G%) = - log(p + 7) + \og{q - 7) - log(p - 7) + log(g + 7) 

, fp + l\ Ll (1-1 
= - log — ; — + log 



+ 7/ \P — 7 

The condition p > q implies that G'{^) < for all 7 € (0, q). Therefore, it is evident 
that, of the two forecasts, the logarithmic score prefers the one with a higher entropy. 
We have thus proved the following proposition: 

Proposition 2.1 Given two forecasts, /, = (p + 7, q — 7) and /_ = (p — 7,0 + 7), 

where < 7 < q < p, the logarithmic scoring rule prefers f_. Moreover, f_ has a 
higher entropy than f , . 
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What about when there are two forecasts f i = (p + "fi,q — 7i), i = l,2 with < 71 < 
72 < q and p > q? It is obvious that the Brier score will prefer f 1 over f 2 . The question 
is, which of the two forecasts will the logarithmic scoring rule prefer? We answer this 
question by stating the following proposition: 

Proposition 2.2 Given two forecasts j { = (p+7i, q — 7i), i = 1,2 with < 71 < 72 < g 
and p > g, i/ie logarithmic scoring rule prefers fi over f 2 . 

Proof. In order to prove this proposition, it is sufficient to consider the expected 
logarithmic score of the forecast / = (p + -y,q — 7), which is given by equation ([T]). 
Differentiating the equation with respect to 7 yields 

Equation © implies that, if q > 7 > 0, E[LS(/, J)] is an increasing function of 7. 
Hence, the logarithmic scoring rule prefers the forecast f±. 

On the other hand, if 7 < with I7I < p, then equation ([5]) implies that E[LS(/, «/)] 
is a decreasing function of 7. It then follow that, given 72 < 71 < with I72I < p, the 
logarithmic scoring rule will prefer the forecast f±. ■ 

Finally, let us consider the case of two forecasts f 1 = (jp + 71, <? — 71) and f 2 = 
(p — 72,9 + 72), where < 71 < 72 < q < p. Again, it is clear that the Brier score 
will prefer the forecast fi over f 2 . It remains to be seen which forecast the logarithmic 
scoring rule will prefer. This may be determined by considering the function #(71,72), 
where 

F( 7 i,72)=plog(t^) +q log(i±Jl) (6) 

+ 71/ \q-jij 

Note that H(<yx, 72) = E[LS(/ 1 , J)] - E[LS(/ 2 , J)}- The forecast f 1 is preferred if 
-£^(71,72) < 0. The following proposition gives insights of relative forecast performance 
in the parameter space. 

Proposition 2.3 Given that < 72 < q < p, there exists 7* € (0,72) such that (a) 
H(n*, 72) = 0, (b) #(71,72) > for 71 G (7*, 72) and (c) #(71,72) < for 71 G (0,7*). 

Before proving the above proposition, we remark that #(71,72) < if and only if the 
logarithmic scoring rule prefers the forecast f±. This proposition implies that the log- 
arithmic scoring rule and the Brier score prefer different forecasts when 71 G (7*,72)- 
Let us now consider the proof of this proposition. 

Proof. In proving this proposition, it is useful to bear in mind that #(72,72) > 0. 
The partial derivatives of equation are given by 

— = 7i and dH_ _ -72 , 7) 

<hi (P + 7l)(g-7l) #72 (p + 72)(<?-72)' 

Further more, we can differentiate equations ([7]) to obtain 

d 2 H _ gg+j aM d*H _ -(m + 72 2 ) (8) 

07? {p + li) 2 (q -7i) 2 ^72 (f-72) 2 (<7 + 72) 2 ' 

It follows from equations (0) that dH/dji = at 71 = and dH/d^ 2 = at 72 = 0. 
Since d 2 H/d^\ > for all 71, #(71,-) has a global minimum at 71 = 0. Similarly, 
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72) has a global maximum at 72 = since d 2 H/d^2 < f° r an 72 an d the first partial 
derivative with respect to 72 vanishes there. In particular, H(0, 72) < #(0,0) = 0, i.e. 
#(0,72) < 0. For 72 > 0, we have the strict inequality, #(0,72) < 0. But we also have 
#(72,72) > from Proposition 12.11 It, therefore, follows from the intermediate value 
theorem that #(71,72) = for some 71 = 7* G (0,72), which completes the proof. ■ 

Proposition 2.4 For positive 71 and 72 such that 71 < q < p and 72 < p, the entropy 
of the forecast f 1 = (p + 71, <? — 71) is lower than that of the forecast f 2 = (p — 72,9 + 72) 
whenever 72 < (p — q)/2. 

A consequence of this proposition is that the forecast corresponding to 71 = 7* is more 
informative than f 2 provided 72 < (p — q)/2. Otherwise, either forecast could be more 
informative than the other. We now give the proof of this proposition. 

Proof. To prove the above proposition, we consider the derivative of equation @: 

^=-iog(^y 

d 7 \q-jj 

We then note that dh/d'j < provided that (p — q) > —27. If 7 > 0, this inequality is 
trivially satisfied. On the other hand, if 7 < 0, then the inequality is satisfied provided 
I7I < (p — q)/2. If 72 < (p — (?)/2, then ^(7) is a strictly decreasing function for 
all 7 E [—72,72], which implies that h(-yi) > ^(72)- If 72 > (p — ?)/2, then ^(7) is an 
increasing function for all 7 G (—72, —(p—q)/2) (provided p > 3q) and strictly decreasing 
function in (— (p — g)/2,7i), which implies that h(—(p — q)/2) > max{/i(7i), h(— 72)}- 
Hence, in this case, we cannot determine which of ^.(71) and ^(—72) is lower. ■ 

2.3 The Spherical Scoring Rule 

The spherical scoring rule is given by 

S(f,j) 

Define /_ = p — 7 and / + = p + 7. Which of the two forecasts /_ and does the 
spherical scoring rule prefer? In order to address this question, we appeal to geometry. 
Considering / = p + 7, the dot product rule yields 

llplbll/lbcosf = (f,p) 

where 9 is the angle between / and p. The above formula may be rewritten as 

a \\P\\1 + <7»P) 
cos# = — n — — i . 

IIPII2II/II2 

We then state the following proposition: 

Proposition 2.5 If p = (p, q) and 7 = (7, —7) and if we denote the right hand side of 
equation (0|) by C{pf), then 

dC( 7 ) = -7 
d 7 IIPII2II/H2' 
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(9) 



Proof. First note that ||/||| = | [| + 2( 7 ,p) + H7H2 and 

d||/|| 2 = (g - g) + 2 7 
d 7 II/II2 

Using the quotient rule, we then differentiate C(-y) with respect to 7 to obtain 

dC( 7 ) _ H/lbd^dlPlll + <7,p)) ~ GlPlli + <7,P))^II/I| 2 
d 7 INI2II/H2 

_ Ijp - g) - CHi + <7,p))[(p - g) + 27] 
INbll/lli 

(g - g)(llp|l2 2 <7,p) + Njj) - (Ml ± (^p))[(p - g) + 

llplbll/lll 

-2-f\\p\\ 2 2 +j(p-q) 2 



Iplbll/ 113 



2 

-27l|p|li + 7(l|p|li -2pg) 



IPII2II/H2 

2 



-7(j|p|l2 + 2pg) 

IIpIWI/" 3 
-t(p + ?) 



2 
2 



llplbll/lll ' 

The desired result follows from noting that p + q = 1. ■ 

Proposition 2.6 Suppose that p > q and 7 E (0,(7). TTien t/ie spherical scoring rule 
prefers the lower entropy forecast, f + , instead of f . 

Proof. Since the spherical scoring rule is effective, it suffices for us to show that 
d*(f + ,p) < d*(f,p). Suppose the angles that each of / + and /_ makes with p are 
respectively 0+ and 0-. It is then true that d*(f + ,p) < d*(f,p) if and only + < 9- 
since each distance is the length of a chord on a unit circle. Note that C(0) = 1 and 
C"(0) = 0. From Proposition |2"31 —C'(r) < C'(-t) for all r G (0,7), which implies that 

n n n 1—1 

- / C"(r)dr < / C"(-r)dr => - C"(r)dr < - / C"(r)dr 



(r)dr 



~ 7 ~_ 7 

/ C"(r)dr> / C" 
Jo 

C(r)|^ > C(r)|o 7 

C( 7 ) - C(0) > C(- 7 ) - C(0) 

C( 7 ) > C(- 7 ). 



But (7(7) > C{— 7) implies that # + < 



3 Density Forecasts 

This section considers scoring rules for for forecasts of continuous variables. It is in some 
sense a generalisation of the previous section. As before, we consider how each scoring 
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rule would rank two competing predictive distributions of fairly good quality. In the 
case of the logarithmic scoring rule and the Continuous Ranked Probability Score, we 
consider errors of each predictive distribution, f(x), from the target distribution, p(x), 
that are odd functions, i.e. 7(2;) = f(x) — p{x) with j(— x) = —7(2;). 

Familiarity with the following notation and definitions will be useful. Given two 
functions f(x) and g(x) that are bounded, an inner product is defined by 

/oo 
f(x)g(x)dx. 
-00 

Then the L2 norm is defined to be H/H2 = {f,/) 1 ^ 2 - 



3.1 The Quadratic scoring rule 



A con tinuous counterpart of the Brier score is the quadratic score (jGneiting and Rafteryl . 



20071 ). given by 

QS(f,X) = \\f\\l-2f(X), 
where X is a random variable. Taking the expectation yields 

nQS(f,X)] = \\f-p\\ 2 2 -\\p\\l (10) 



We can now write f(x) = p(x) + 7(2;), where J j(x)dx = 0, and substitute it into (jTOj) 
to obtain 

E[Q5(/,X)] = || 7 |||-|H|2 

As was the case with the Brier score, the functions ±7(2;) yield the same quadratic 
score. For any two forecasts, fi(x) = p(x) + 7^(2;), i = 1,2 with H71H2 < 1 1 T2 1 1 2 ; the 
quadratic scoring rule would prefer fi(x). Further more, ||7i|| = H72II implies that 
E[QS(f 1 ,X)}=E[QS(h,X)}. 



3.2 The Logarithmic scoring rule 

The expectation of the logarithmic scoring rule for the forecast is 

E[LS(/,X))] = - j p{x)\og(p{x) + 7 (x))dx. 

As in the discrete case, we introduce the pdfs f+(x) = p(x) + 7(2;), f-{x) = p(x) — j(x) 
so that we can define E[LS]± = E[LS(/+,X))] - E[LS(/_, X))}. It follows that 

E[L S ] ± = /p W1 „ g (fc|)d,. (11) 

It is necessary that |7(x)| < p(x) for (|lip to be well defined. Consider the case when 
p{x) = p{—x). If, in addition, 7(x) is an odd function, i.e. j(—x) = —j(x), then 
equation (HH) yields E[LS]± = 0. 

When 7(— x) = — 7(2;) and J^ o p(x)dx > 0.5, we state the following proposition: 

Proposition 3.1 Given that ~f(—x) = —j(x) with 7(|:e|) < and p(\x\) < p{x), then 
E[LS]± > 0. 
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The above proposition gives conditions under which the forecast f-(x) is preferred by 
the logarithmic scoring rule over /+(#). 
Proof. The proof proceeds as follows: 

If we now perform a change of variable u = —x in the right hand integral and then 
replace u by x, we obtain 



o 



E1 Ls 1± = / m log dx - log = ] dx 



oo 

/■° /p(i)-7(x)\ /-° / g(-x) + 7(x) \ 
/ p ^ log T r \ ) dx + P{~ x ) log -7 — \ 



dx 



where we used < p(x) to obtain the last inequality. To justify the use of this 

inequality, we need to show that the function 



<J>(j?) = plog 



P + l 



is a decreasing function for 7 G (0,p). Differentiating with respect to p yields 

i'p + j\ 2p7 

$ (p) = log 



P ~ 1 J P 2 ~ 7 2 

It now suffices to show that <E>'(j>) < for all p. Let us introduce the notation W(p) = 
\og[{p + 7)/(p - 7)] and y(p) = 2pj/(p 2 - -y 2 ) so that = W(p) - y(p). Note that 

W(27) = log 2 and y(2 7 ) = 4/3 = log(e 4 / 3 ). Hence W{2j) < Y(2j), which implies that 
<I )/ (27) < 0. Differentiating W(p) and Y(p) with respect to p yields 

= and y >) = -y + f> . 

p z — ry^ \P — 7 J 

It is now clear that VF'(p) < and V(p) < for all p. Further more, Y'{p) < W'(p). 
Hence W(p) < Y(p) for all p € (7, 27], which implies that < for all p £ (7, 27]. 

It now remains to be shown that 3>'(p) < for all p 6 (27,00). It suffices to consider 
the asymptotic behaviour as p — > 00. Applying L'Hopital's rule, we obtain 

lim ^ =lim ^ =lim P!±7_ = L 

p-i-oo p->oo |y(p)| p->oo p 2 — 7 2 

Hence, lim p _ i , 00 W (p) = lim^oo Y(p), i.e. W(oo) = y(oo). With this result in mind, 
for all p £ [27, 00), we have 



/ y'(r)dr < / iy / (r)dr Y(t)\™ < W{r)\ 
Jp Jp 



00 
v 



y(oo) - y(p) < w(oo) - w(p) 

W(p) < Y{p), 
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which completes the proof. The condition p(|x|) < p(x) implies that p(x)dx > 0.5. 
It corresponds to the case p > q in the discrete case. ■ 

We now want to compare the entropies of the forecasts f{x) = p(x) ± 7(x) when 
7(— x) = —7(2;) and 7(|x|) < 7(2;). The entropy of the function f(x) = p(x) + 7(2:) is 
then given by 



Hi) = - J (p(x) + 7(2;)) log(p(x) + j(x))dx. 
The functional derivative of h("f) with respect to 7 is then given by 
6h(j) d 



(12) 



8'j(x) 



{(p(x) + 7(x)) log(p(x) + 7(x))} 



d'j(x) 
■pog(p + 7) + l] 



(13) 



The order 0(e) part of /i(7 + e57) — h(j) is given by (see Stone and Goldbart ( 20081 ) for 
further insights) 



Plugging (fT3j) into (fTlj) yields 



^7(3;) 



#7(x)dx. 



(14) 



^(7) 



\og(p(x) + 7(3; 

log(p(x) + 7(3; 

log(p(x) + 7(3: 

log(p(x) + j(x 

log(p(x) + 7(3: 
p(x) + 7 



log 



) + l]Sj(x)dx 
) + l]^7(x)dx 



[log(p(x) + 7(2:)) + l]<57(x)dx 



r— OO 

) + l]<57(x)dx + / [log(p(— x) + 7(— x)) + 1] ^7(— x)dx 
Jo 



[log(p(— x) + 7(— x)) + l]5j(—x)dx 



—00 




p(—x) — 7(x) 



) + l]5~/(x)dx 

) + l]#7(x)dx + / log! />{ —■!■) - ~ (.r) j +■ J \ (.r)d. r 

S'y(x)dx, 



where we have applied a change of variable x — > —x in the second integral of the third 
line and assumed ^7(— x) = — 5 / y(x) in the fifth line. In particular, 



Sh(-y)\ 



7=0 



log 



p(x) 



#7(x)dx. 



K p(— x) 

Using the assumption that p(x) > p(—x) whenever x < 0, we consequently obtain 

M(7)It=o ^ °> ( 15 ) 
if 5^(x) > for all x < 0. In effect, we have just proved the following proposition: 

Proposition 3.2 Given that j(—x) = —7(2;), J 7(x)dx = 0, 7(|x[) < 0, p(|x|) < p(x) 
and |7(x)| < p(x), then the entropy of the forecast density f+(x) = p(x) + 7(x) is lower 
than that of the forecast density f-(x) = p(x) — 7(x). 



10 



Propositions 13. ll and 13.21 imply that the logarithmic scoring rule prefers the forecast den- 
sity that is less informative, which is in agreement with the categorical case considered 
in the previous section. 

Proposition 3.3 Given two forecasts fi(x) = p(x) +ji(x), i = 1,2, with (i) |7i(aj)| < 
|7 2 (x)|, (ii) Ji(\x\) < 0, (Hi) Ji(-x) = -Ji(x), (iv) \ji(x)\ < p(x) and (v) p(\x\) < p(x), 
then the logarithmic scoring rule prefers forecast f\{x) over forecast f2(x). 

Proof. To prove the above proposition, we consider the functional derivative of the ex- 
pected logarithmic scoring rule, E[LS] = — p{x) log(p(x) + 7(x))dx. The functional 
derivative with respect to j(x) is 

^E[LS] - ~ P{X) 



<^7 p(x) + j(x) 

Using this result, we obtain the first variation of E[LS] as 

mLS] = r 6 J^i 6lix)dx 

-p(x) 



oo 



oo 



p(x) + j(x) 



5^f(x)dx 



-oo P{X) + J[X) Jo P{X) + J{X) 

o r°° p(-x) 

— r^; -— dj(x)dx + / — r07(— X)dx 

-P(x) c , s , Z" p(— X) . . . , 

— — — --(57 x dx+ / — r --(57 x dx 

_ 00 p(x)+7(x) y_ M P(-X) - 1[X) 

" p(— x) p(x) 



p(— x) — 7(x) p(x) + 7(x) 
[p(— x) +p(x)]j(x) 



5"y(x)dx 



CO 



07(x)dx 

I Ml ■■■)' i ( }■ || i ii ■■")' I -!- * I ■")■' 1 

OO 



[p{-x] - j(x)][p(x) + 7(x)] 
>0, 



provided 8^f{x) > for all x < 0, <57(— x) = —5j(x), j(—x) = — 7(x) and 7(|x|) < 
0. What has been shown is that as j(x) changes by S^f(x), the expected logarithmic 
score changes by a positive amount. In particular, if we start at 7(x) = 71 (x), and 
progressively move towards 7(3:) = 72 (x) by making successive additions of <57(x), the 
expected logarithmic score can only increase. Hence the expected logarithmic score of 
72 (x) will be higher than that of 72(2:), which yields the result. ■ 

We shall now consider two forecasts, /i(x) = p(x) + 71 (x) and fi{x) = p(x) — 72 (x) 
with |7i(x)[ < I72OE) I < p(x). In this case, the quadratic scoring rule would prefer /i(x) 
over / 2 (a:)- In order to determine which forecast the logarithmic scoring would prefer, 
we consider the functional 



(16) 



Then the following proposition holds 
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Proposition 3.4 Given that [71 (a) | < [72 (a?) | < p(x) and ji(—x) = — 7j(x), i = 1,2, 
there exists 7 * (a) satisfying the inequalities 7*(x)72(x) > and |7*(x)| < |72(x)| such 
that (a) ft (7*, 72) = 0, (b) ft( 7 i,7 2 ) > for | 7 *j < | 7l | and (c) ft( 7 i,7 2 ) < for 

It* I > l7il- 

Proof. It is helpful to first note that Proposition 13.11 implies that ft (72, 72) > when 
72 7^ 0. Thinking of 71 (x) as fixed, the first variation of ft(-,72) with respect to 72(a) is 
given by 

( '' 72)= ' ~JuxT h2{x) 



-00 
00 



X ( \A 

072(xjdx 



00 



p{x) - 72(a) 



^7 2 (x)dx+ / ——^-^—-5-f 2 {x)dx 



00 



p(a) - 72(a) ' J p(x)- 72(a) 



072(xjdx— / — -. r ^-072(xjdx 



00 p(x)- 72(a) ' Jo p(-a) + 72(a) 

prdTJ x dx+ / — r— — rT 57 2 (a)dx 

00 P(x) - 72(a) J-oaPi-X) + 72W 

j ' p(— x) p(a) 



-00 




p(-x) + 72 (x) p(x) - 72 (x) 
-[p(-x) +p(x)]7 2 (x) 



<572(x)dx 



-x) + 72(a)] [p(a) - 72(a)] 



e>72(x)dx 



<0, 



provided £72 > and £72 (—a) = — £72 (a). In the fourth line a change of variable 
x = — r was applied and then r was replaced with x since it is a dummy variable. It 
follows that ft(-,72) has a maximum when 72 = 0, i.e. ft (-,72) < ft(-,0). In particular, 
^(0,72) < ft (0,0) = 0. For 72 7^ 0, we have the strict inequality, ft(0,72) < 0. Since 
^(72,72) > 0, continuity implies that ft(7i,72) = for some 71(a) = 7*(a) such that 
1 7*[ < 1 72!, and this completes the proof. ■ 

3.3 The Spherical Scoring Rule 

Given a forecast /(a), the spherical scoring rule is given by 

s{LX)= ~mi' 

If we define the operator pf = /(a)/||/||2, the expected spherical score is the inner 
product 

E[S(f,X)] = -(pf,p). 

The minimum of t his expectation i s achieved if and only f = p since it is a strictly 
proper scoring rule ( Friedman . 19831 ). We now state the following proposition: 

Proposition 3.5 Given that 7(— a) = —7(a) with 7(|a|) < 0, (7(a)) < p(x) and 
p(\x\) < p(x), then the spherical scoring rule prefers the forecast /+(a) over /_(a), 
i.e. E[S(U,X)] <E[S(/_,X)] . 
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Proof. The aim here is to show that E[S(f + , X)] < K[S(f-,X)], which is equivalent to 
(pf+ > P) > (pf- > P) ■ Note that each of these inner products is non- negative since 

/ t \ (f±,P) 

ll/±l|2 
(P±J,P) 
ll/±l|2 

llp|li±<7,P> 

ll/±l|2 

> o, 

due to Cauchy Schwartz's inequality, (± 7 ,p) < H7H2IMI2, and the hypothesis, | 7 (x)| < 
p(z) => II7II2 < \\p\\2- Therefore, {pf+,p) > (pf-,p) is equivalent to {pf+,p) 2 > 
(pf-,p) 2 . It therefore suffices to show that the latter inequality holds. 

/ f \2 / , \2 if+,P) 2 (f-,P) 2 
(Pf+,P) -(Pf-,P) = „, M 2 ^Tm^ 



(p + 7,p) 2 _ (p - 7,p) 2 
II/+II2 ^ ^ 

(Iblli + (7,P)) 2 (!HI^-(7,p)) 2 



Il/-III(IHI| + (7,P)) 2 -Il/ + IIKIHI|-(7,P)) 2 

Plugging in ||/ + ||| = |H! + 2( 7 ,p) + || 7 ||| and = \\p\\ 2 - 2< 7 ,p) + || 7 ||2 into the 

numerator of the last expression, removing brackets and collecting like terms yield 

(pU,p? - (,/-,)'- <™W»§=<2^ . (17) 

II/+II2II/-II2 

As a consequence of Cauchy- Schwartz's inequality, HpHIIMII ~~ {liP) 2 — 0- It will now 
be shown that under the hypothesis of the proposition, ( r y,p) > 0. 



/oo 
7(x)p(x)da 
-00 



' —00 

/0 poo 
"f(x)p(x)dx + / r y(x)p(x)dx 
-00 J 

/0 r— 00 

7(x)p(x)dx — / 7 (— x)p(— x)dx 
-00 JO 
rO 

7(x)p(x)dx + / 7 (— x)p(— x)dx 
-00 J —00 

rO rO 

7(x)p(x)dx — / 7 (x)p(— x)dx 

— OO J —OQ 



7 (x)[p(x) — p(— x)]dx 

-00 

>0, 
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since p(x) > p(—x) and j(x) > for all x < 0. Hence, the right hand side of equa- 
tion ()17p is non-negative. ■ 

The distribution preferred by the spherical scoring rule is already known through 
Proposition 13.21 to be of lower entropy. As was the case in the binary case, the spherical 
scoring rule prefers an opposite distribution to the logarithmic scoring rule. 



3.4 Continuous Ranked Probability Score 

Finally, we consider the Continuous Ranked Probability Score (CRPS) of the density 
forecast f(x) whose cumulative distr ibution is F(x). The CRPS is a function of F and 
the verification X and is defined by (IGneiting and Rafteryl . 120071 ) 



/oo 
(F(t) - I{t > X}) 2 dr. 
-oo 



The above score may equivalently be written as 



/Jt poo 
F 2 (r)dr+ / (F(r) - l) 2 dr. (18) 
-oo JX 



It follows from (fTSD that 



/oo px pOO poo 

p(x) F 2 (r)dTdx+ p(x) (F(t) - l) 2 drdx, (19) 
-oo J —oo J —oo Jx 

where p{x) is the true (or target) density function. If P(x) = J^ oo p('r)dr, we can then 
apply the integration by parts formula to each term on the right hand side of (|19p to 
obtain 

/oo rx rx oo i-oo 

p(x) F 2 (r)drdx= P(x) F 2 (r)dr -/ P(x)F 2 (x)dx 
-oo J —oo J —oo —oo J —oc 

/oo roo 
F 2 (x)dx- / P(x)F 2 (x)dx 
-oo J —oo 



and 

"OO 



/oo poo poo oo poo 

p(x) (F(r) - l)drdx = P{x) / (F(r) - l) 2 dr +/ P{x){F{x) - ifdx 
-oo Jx Jx —oo J —oo 

/oo 
P(x)(F(x) - l) 2 dx, 
-oo 



whence 

/oo poo 
P(x)(l - P(x))dx + (F(x) - P(x)) 2 dx, (20) 
-oo J —oo 

after some algebraic manipulation. Define F(x) = P(x)+F(x), where T(x) = J_ 7(r)dr 
and f(x) =p(x)+7(x). If we also define G(P) = f P(x)(l — P(x))dx, then equation (|2TI|) 
can be re- written as E[CRPS(-F, X)] = G(P) + \ \T\ j 2 . We have thus proved the following 
proposition: 

Proposition 3.6 The Continuous Ranked Probability Score does not distinguish be- 
tween distributions whose cumulative errors from the target distribution are equal in the 
sense of the L 2 norm. 
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Consequently, the CRPS does not distinguish between density forecasts whose errors 
from the target density differ by the sign. To see this, consider two forecasts which whose 
errors from the target density are Ji(x) and 72(2;) respectively, with 71 (x) = —72(2;). It 
then follows that 

/OO /'OO 
r 2 (x)dx - / T 2 2 (x)dx 
-00 J —00 

/oo 
(r?(x)-rl(x))dx 
-00 

/oo 
(r 1 (x) + r 2 (x))(r 1 (x)-r 2 (x))dx 
-00 

= 0, 

since 71 (x) = -72(2;) => Ti(x) + T 2 (x) = 0. 

As a final remark, we note that the second term in the expect ation of CRPS in (1201) 



somew hat resembles the mean squared error criterion discussed in Corradi and Swansonl 



(2006). The mean squared error of the forecast F(x) is E[r 2 (X)] = J p(x)T 2 (x)dx. 



Likewise, the mean squared error criterion does not distinguish between forecasts whose 
errors from the target density differ by a sign (i.e. 71 (x) = —72(3:)) because 

/oo 
p(x) (r 2 (x)-r 2 (x))dx 
-00 

00 

P (x) (r x (x) + r 2 (x)) (r x (x) - r 2 (x)) dx 



00 



0. 



4 Discussion and Conclusions 

This manuscript contrasted how certain scoring rules would rank competing forecasts of 
specified departures from the target distribution. In the categorical case, we considered 
the Brier Score, the logarithmic scoring rule and the spherical scoring rule, focusing on 
the binary case. Given two forecasts whose errors from the target distribution differ 
only by the sign, we found that the logarithmic scoring rule prefers the higher entropy 
distribution whilst the spherical scoring rule prefers the lower entropy distribution. The 
Brier score does not distinguish the two distributions. The logarithmic scoring rule 
selects a lower entropy forecast only if it is nearer to the target distribution in the sense 
of the L 2 norm and vice versa for the spherical scoring rule. 

We extended the investigation from binary forecasts to the continuous case, where 
we considered the Quadratic score, Logarithmic scoring rule, spherical scoring rule and 
the Continuous Ranked Probability Score (CRPS). Just like the Brier score in the binary 
case, the Quadratic Score does not distinguish between forecasts with equal L 2 norms 
of their errors from the target distribution. On the other hand, given two density 
forecasts whose errors from the target forecast differ by a sign, the logarithmic scoring 
rule prefers the distribution with higher entropy whilst the spherical scoring rule prefers 
the one with lower entropy : bear in mind that higher entropy corresponds to more 
uncertainty ( Shannon , 19481 ) . The CRPS is indifferent to forecasts whose errors from 



the target density differ by a sign. 
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Some have criticised the logarithmic scoring rule for placing a heavy penalty on as- 

signi ng zero probability to events that materialise (e.g. lBoero et a/.Ll201ll : lGneiting and Raftervl . 



20071 ); but assigning zero pr obability to ev ents that are possible is also discouraged by 
Laplace's rule of succession ( Javnesl . 2003 ). What has been shown here is that the log- 
arithmic scoring rule is good at highlighting forecasts that are less uncertain than ideal 
forecasts. Such forecasts may hav e to be d e alt w ith appropriately. One way of dealing 
with such forecasts is discussed in iMachetel (j2012h . Nonetheless, given two density fore- 
casts, the logarithmic scoring rule does not just reject the more extreme in the sense 
of entropy: If both forecasts are more uncertain that the ideal forecast, the logarithmic 
scoring rule will tend to prefer the less uncertain of the two. 

Does our consideration of departures from ideal forecasts amount to advocating 
for dishonesty by forecasters? Not at all. We are merely making an observation that 
forecasters can honestly report predictive distributions that have departures from ideal 
forecasts. Although strictly proper scoring rules encourage forecasters to be honest when 
they report their best judgements, they do not guarantee that the reported forecasts 
will coincide with ideal forecasts. Our point then is that using a given scoring rule may 
inherently favour departures from ideal forecasts in one direction more than in another. 
Therefore, when one selects a scoring rule to estimate distribution parameters or choose 
between two competing experts, it amounts to deciding preferred departures. 

Which scoring rule one should choose will depend on the application at hand. Com- 
bining insights of scoring rules set forth in this paper with an understanding of the 
situation at hand can help decide which scoring rule is most appropriate. The issue to 
consider may be decisions associated with high impact, low probability events. To illus- 
trate our point, let us consider inflation forecasting. It is undesirable to over estimate 
the probability for extreme inflation because of the panic it can create as buyers rush 
to spend now before prices rise. In order to manage peoples expectations better, the 
spherical scoring rule is preferable in this case. As another example, consider seasonal 
forecasts of drought in the UK, which is arguably a rare event. Under estimating the 
probability of this event could result in water shortages since water companies might 
not be stringent on water usage. In this case, the logarithmic scoring rule is preferable. 
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